A Proposal for WSD Using Semantic Similarity

نویسندگان

  • Susana Soler
  • Andrés Montoyo
چکیده

The aim of this paper is to describe a new method for the automatic resolution of lexical ambiguity of verbs in English texts, based on the idea of semantic similarity between nouns using WordNet. 1 An outline of our approach. The method of WSD proposed in this paper is based on knowledge and consists basically of sense-disambiguating of the verb that appear in an English sentence. A simple sentence or question can usually be briefly described by an action and an object [1]. For example the main idea from the sentence "He eats bananas" can be described by the action-object pair "eat-banana". Our method determine which senses of these two words are more similar between themselves. For this task we use the concept of semantic similarity [2] between nouns based on WordNet [3] hierarchy. In WordNet, the gloss of a verb synset provides a nouncontext for that verb, i.e. the possible nouns occurring in the context of that particular verb [1]. The glosses are used here in the same way a corpus is used. Our method takes into consideration the verb-noun pair extracted from the sentence. This verb-noun pair is the input for the algorithm. The output will be the sense tagged verb-noun pair, so we assign the sense of the verb. The algorithm is described as follows: Step 1. Determine all the possible senses for the verb and the noun by using WordNet. Let us denote them by and Step 2. For each sense of verb vh and all senses of noun : 2.1. Extract all the glosses from the sub-hierarchy including vh. The sub-hierarchy including a verb vh is determined as follows: consider the hypernym hh of the verb vh and consider the hierarchy having hh as top [1]. 2.2. Determine the nouns from these glosses. These constitute the noun-context of the verb. Determine all the possible senses for all these nouns. Let us denote them by . 2.3. Then we obtain the similarity matrix (Sm) using the semantic similarity, where each element is defined as follows: Sm(i, j) = sim (xi, nj) For determining the semantic similarity (sim(xi, nj)) between each sense of the nouns extracted from the gloss of verb and each sense of the input noun, we use the formula followed: sim (xi, nj) = 1 – sd (xi, nj) 2 sd (xi, nj) = 2 1 . ( 1 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text

INTRODUCTION In this article, we evaluate a knowledge-based word sense disambiguation method that determines the intended concept associated with an ambiguous word in biomedical text using semantic similarity and relatedness measures. These measures quantify the degree of similarity or relatedness between concepts in the Unified Medical Language System (UMLS). The objective of this work is to d...

متن کامل

Learning the Latent Semantics of a Concept from its Definition

In this paper we study unsupervised word sense disambiguation (WSD) based on sense definition. We learn low-dimensional latent semantic vectors of concept definitions to construct a more robust sense similarity measure wmfvec. Experiments on four all-words WSD data sets show significant improvement over the baseline WSD systems and LDA based similarity measures, achieving results comparable to ...

متن کامل

Un Sistema de Extracción de Información Basado en Ontologías para Documentos en el Dominio de las Tecnologías de Información An Ontology-Based Information Extractor for Data-Rich Documents in the Information Technology Domain

This paper presents an information extraction method, suitable for data-rich documents, based on the knowledge represented in a domain ontology. The extractor combines a fuzzy string matcher and a word sense disambiguation (WSD) algorithm. The fuzzy string matcher finds mentions of terms combining character-level and token-level similarity measures dealing with non-standardized acronyms and inc...

متن کامل

PUTOP: Turning Predominant Senses into a Topic Model for Word Sense Disambiguation

We extend on McCarthy et al.’s predominant sense method to create an unsupervised method of word sense disambiguation that uses automatically derived topics using Latent Dirichlet allocation. Using topicspecific synset similarity measures, we create predictions for each word in each document using only word frequency information. It is hoped that this procedure can improve upon the method for l...

متن کامل

Semantic Similarity Functions in Word Sense Disambiguation

This paper presents a method of improving the results of automatic Word Sense Disambiguation by generalizing nouns appearing in a disambiguated context to concepts. A corpus-based semantic similarity function is used for that purpose, by substituting appearances of particular nouns with a set of the most closely related similar words. We show that this approach may be applied to both supervised...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002